# Large-Scale Pretraining
Bart Large Teaser De V2
Large German text processing model based on the BART architecture, suitable for various natural language processing tasks
Large Language Model
Transformers

B
bettertextapp
123
0
Bart Large Paraphrase Generator En De V2
Large-scale English-German paraphrase generation model based on BART architecture
Machine Translation
Transformers

B
bettertextapp
121
0
Instella 3B
Other
AMD's fully open 3-billion-parameter language model family trained on Instinct MI300X GPUs, outperforming open models of similar scale
Large Language Model
Transformers

I
amd
3,048
34
Vit So400m Patch16 Siglip 512.v2 Webli
Apache-2.0
A vision Transformer model based on SigLIP 2, designed for image feature extraction and suitable for multilingual vision-language tasks.
Text-to-Image
Transformers

V
timm
2,766
0
Longva 7B TPO
MIT
LongVA-7B-TPO is a video-text model derived from LongVA-7B through temporal preference optimization, excelling in long video understanding tasks.
Video-to-Text
Transformers

L
ruili0
225
1
Videollama2.1 7B 16F Base
Apache-2.0
VideoLLaMA2.1 is an upgraded version of VideoLLaMA2, focusing on enhancing spatiotemporal modeling and audio understanding capabilities in large video-language models.
Video-to-Text
Transformers English

V
DAMO-NLP-SG
179
1
Depth Anything V2 Base
Depth Anything V2 is currently the most powerful monocular depth estimation (MDE) model, trained on 595,000 synthetically annotated images and over 62 million real unannotated images.
3D Vision English
D
depth-anything
66.95k
17
4M 21 L
Other
4M is an 'any-to-any' foundational model training framework extended to multiple modalities through tokenization and masking techniques
Multimodal Fusion
4
EPFL-VILAB
49
3
Chronos T5 Large
Apache-2.0
Chronos is a family of pretrained time series forecasting models based on language model architecture, which supports probabilistic forecasting by converting time series into token sequences for training.
Climate Model
Transformers

C
autogluon
59.18k
6
Vitamin XL 256px
MIT
ViTamin-XL-256px is a vision-language model based on the ViTamin architecture, designed for efficient visual feature extraction and multimodal tasks, supporting high-resolution image processing.
Text-to-Image
Transformers

V
jienengchen
655
1
Vitamin XL 384px
MIT
ViTamin-XL-384px is a large-scale vision-language model based on the ViTamin architecture, specifically designed for vision-language tasks, supporting high-resolution image processing and multimodal feature extraction.
Image-to-Text
Transformers

V
jienengchen
104
20
Pile T5 Base
Pile-T5 Base is an encoder-decoder model trained on The Pile dataset using the T5x library, trained for 2 million steps with MLM objective, approximately 2 trillion tokens.
Large Language Model
Transformers English

P
EleutherAI
50
19
Stt Fr Fastconformer Hybrid Large Pc
This is a French automatic speech recognition model based on the FastConformer architecture, combining Transducer and CTC decoders for high accuracy and multi-domain adaptability.
Speech Recognition French
S
nvidia
1,331
5
Chinese Clip Vit Base Patch16
The base version of Chinese CLIP, using ViT-B/16 as the image encoder and RoBERTa-wwm-base as the text encoder, trained on a large-scale dataset of approximately 200 million Chinese image-text pairs.
Text-to-Image
Transformers

C
OFA-Sys
49.02k
104
Bertovski
BERTovski is a large pre-trained language model based on Bulgarian and Macedonian texts, utilizing the RoBERTa architecture, and is a product of the MaCoCu project.
Large Language Model Other
B
MaCoCu
28
1
Wav2vec2 Large Tedlium
Apache-2.0
Wav2Vec2 large speech recognition model fine-tuned on the TEDLIUM corpus, supporting English speech-to-text conversion
Speech Recognition English
W
sanchit-gandhi
58
1
Vision Perceiver Fourier
Apache-2.0
Perceiver IO is a general-purpose Transformer architecture capable of processing multiple modalities. This model is specifically designed for image classification tasks and pretrained on the ImageNet dataset.
Image Classification
Transformers

V
deepmind
1,168
2
Deberta Base
MIT
DeBERTa is an enhanced BERT decoding model based on the disentangled attention mechanism, improving upon BERT and RoBERTa models, and excels in natural language understanding tasks.
Large Language Model
Transformers English

D
kamalkraj
287
0
Albert Xxlarge V1
Apache-2.0
ALBERT XXLarge v1 is a Transformer model pretrained on English corpus using Masked Language Modeling (MLM) objective with parameter-sharing features.
Large Language Model
Transformers English

A
albert
930
5
Featured Recommended AI Models